Introduction to R

ResBaz Victoria 2024

Introducing R

R is a language and an environment for working with data.

It has a large community of users and developers, and many specialized packages.

We will primarily work with data by writing R code.

Why code?

If every step of your analysis is recorded in an R script, with no manual steps:

  • you have a complete record of what you have done
  • changes easily tested, poor early decisions easily fixed
  • today’s big project becomes a function in a package, serves as tomorrow’s building block

Programming is an essential part of reproducible research.

  • other researchers can precisely understand and verify your work


R is open-source and free, so others can use your code without any barriers.

Data analysis follows a script

Diagram from “R for Data Science” book (https://r4ds.hadley.nz/)

Data analysis follows a script

A self-portrait by Chat GPT

Diagram from “R for Data Science” book (https://r4ds.hadley.nz/)

Model could mean:

  • Summarize data with counts, means, etc.
  • More generally “fit a model” to the data.
    • Machine learning or AI based models →
  • Using the model, perform statistical tests.

Data analysis follows a script

Diagram from “R for Data Science” book (https://r4ds.hadley.nz/)

Modelling goes poorly in a vacuum!

  • Visualization is critical to identify problems or make sure you are asking the right question.

  • And first you need to load and tidy your data.

  • And finally you need to communicate your results!

(Do the workshop)

Conclusion

We’ve covered loading, touched on tidying, done some visualization and a little modelling (or at least summarization).

You still need to learn to communicate your results, with your colleagues or the wider world!

Learning programming in R will super-charge your abilities!

  • functions, loops, package writing, …